Module 13 Lecture - Complexity in ANCOVA

Analysis of Variance

Quinton Quagliano, M.S., C.S.P

Department of Educational Psychology

1 Overview and Introduction

Agenda

1 Overview and Introduction

2 The F Distribution

3 The F Ratio

4 One-way ANOVA

5 Conclusion

1.1 Textbook Learning Objectives

  • Interpret the F probability distribution as the number of groups and the sample size change.
  • Discuss one use for the F distribution: one-way ANOVA
  • Conduct and interpret one-way ANOVA.

1.2 Instructor Learning Objectives

  • Appreciate the ANOVA as another test in your toolbox alongside the other tests for dealing with categorical and continuous outcome variables
  • Understand the F-distribution as another practical distribution for deriving inferences and conclusions

1.3 Introduction

  • Discuss: When we needed to compare the averages of two groups on a continuous variable, what statistical test would we use?
  • While we’ve already covered how to test differences between two groups, now we need to consider the scenario with three or more groups on some outcome
    • Examples:
      • Comparing class outcomes for those in hybrid, in-person, or online sections
      • Comparing students on three different tracks in high school: regular, advanced, or remedial
  • When we are comparing three categorical groups on some numeric, continuous outcome, we will employ the one-way ANOVA, standing for Analysis of Variance (ANOVA)
    • ANOVA is a broad family of techniques with many applications and extensions
    • We are looking at the one-way ANOVA as a specific use of this technique
  • Important: This is very similar to how we introduced simple linear regression - there are many more advancements on top of what you learn in this class, which you will encounter in more advanced statistics classes!
  • Why not use a bunch of t-tests across the 3 groups instead of this new complicated method?
    • Problem: It raises our Type I error rate
    • This is something done as part of ANOVA in post-hoc testing, but that won’t be covered here
  • Discuss: Describe what Type I errors are
  • Our previous tests all made use of specific, practical distributions for determining significance:
    • z-tests \(\rightarrow\) normal distribution
    • t-tests \(\rightarrow\) t-distribution
    • \(\chi^2\) tests \(\rightarrow\) \(\chi^2\) distribution
    • The ANOVA family will introduce a new distribution: the F-distribution
  • We’ll start by describing the F-distribution itself, then the F-ratio/statistic that we compute, and then finally the application within the One-way ANOVA

2 The F Distribution

Agenda

1 Overview and Introduction

2 The F Distribution

3 The F Ratio

4 One-way ANOVA

5 Conclusion

2.1 Introduction

  • The F-distribution was developed by Sir Ronald Fisher and is a unique distribution applied often for the ANOVA family of statistical tests
    • In notation it is given as \(F \sim F_{df(num),df(denom)}\) where:
    • \(df(num) \rightarrow df_{between}\) and \(df(denom) \rightarrow df_{within}\)
    • Example: \(F \sim F_{2, 24}\) has:
      • \(df_{between} = 2\)
      • \(df_{within} = 24\)
  • Important: Hold on just a second, *two* different degrees of freedom? Yes! We'll discuss more about why that is in our discussion on the F-ratio itself
  • The F-distribution actually consists of values that are squares of the t-distribution, and is derived from the t-distribution
    • How exactly Fisher proved that is way beyond the scope of this class
  • Discuss: Recall, what exactly is the distinction that makes the t-distribution different and useful over the normal distribution? The F-statistic benefits from this the same!

2.2 Additional Facts About the F Distribution

  • The curve is not symmetrical but skewed to the right.
    • Thus, it has a more similar look to the chi-squared distribution, compared to the t-distribution
  • There is a different curve for each set of dfs.
    • Once again, similar to chi-squared
  • The F statistic is greater than or equal to zero.
  • Discuss: After you work through the F-ratio section and calculation below, return to this section and try to explain, using the formula, why this F is always greater than or equal to zero
  • As the degrees of freedom for the numerator and for the denominator get larger, the curve approximates the normal distribution.
    • Remember, the \(df_{between} \rightarrow\) numerator and the \(df_{within} \rightarrow\) denominator

3 The F Ratio

Agenda

1 Overview and Introduction

2 The F Distribution

3 The F Ratio

4 One-way ANOVA

5 Conclusion

3.1 Introduction

  • The F-ratio is just another name for the F statistic that results from the test we run
    • The F-ratio/statistic we get is a ratio of:
      • The variances between samples
      • The variances within samples
      • It’s formula is given as:

\[ F = \frac{MS_{between}}{MS_{within}} \]

  • Well describe what the \(MS\) means here in a second

3.2 Between & Within Variation

  • For Variances between samples, we are considering how much difference there is between the 3 or more groups we are comparing to one another
    • This may also be known as “variation due to treatment” or “explained variation”
    • This will be represented computationally as the \(MS_{between}\) or the mean square
    • This is the numerator of the F-ratio formula
  • In the case of Variances within samples, we example how much variation there is within each of the groups we are comparing to one another
    • This may also be known as “variation due to error” or “unexplained variation”
    • This will be represented as \(MS_{within}\) in the following formula
    • This is the denominator of the F-ratio formula

3.3 Calculation of the F-ratio/statistic

  • We are not going to calculate F by hand, as it is very time-consuming, albeit plenty possible
    • However, you should still follow along and make sure you can see the flow of information as we substitute numbers in these calculations

Overall F Calculation

\[ F = \frac{MS_{between}}{MS_{within}} \]

Between-group Calculations

\[ MS_{between} = \frac{SS_{between}}{df_{between}} \]

\[ df_{between} = k - 1 \]

\[ SS_{between} = \sum{[\frac{(s_j)^2}{n_j}] - \frac{(\sum{s_j})^2}{n}} \]

\[ SS_{total} = \sum{x^2 \cdot \frac{(\sum{x})^2}{n}} \]

  • Where
    • \(SS_{between}:\) Sum of squares between groups
    • \(SS_{total}:\) Sum of squares total
    • \(df_{between}:\) degrees of freedom between groups
    • \(k:\) the number of groups
    • \(n:\) total sample size
    • \(n_j:\) the size of \(j^{th}\) group
    • \(s_j:\) sum of values of \(j^{th}\) group

Within-group Calculations

\[ MS_{within} = \frac{SS_{within}}{df_{within}} \]

\[ df_{within} = n - k \]

\[ SS_{within} = SS_{total} - SS_{between} \]

  • Where
    • \(SS_{within}:\) Sum of squares between groups
    • \(SS_{between}:\) Sum of squares between groups
    • \(SS_{total}:\) Sum of squares total
    • \(df_{within}:\) degrees of freedom between groups
    • \(n:\) total sample size

3.4 Conclusions with the F-ratio/statistic

  • Discuss: Review: describe the general processes of hypothesis testing, starting with setting the null and alternative hypotheses and ending with a rejection or retaining of the null hypothesis. Try and use the words 'rare event' at some point in your explanation.
  • The F statistic, which we derive from the formula for whatever test we are using, is actually best described as a ratio or fraction (we’ll return to this ratio/statistic in a bit!)
    • Much like previous test statistics, we use this to determine if our results are rare enough to describe our results as unlikely if the null hypothesis is true
    • As usual, the F statistic has a corresponding p-value that tell us the probability of results due to chance under the null hypothesis
  • Question: What value do we compare p against to determine if we can reject the null hypothesis?
    • A) Alpha
    • B) Omega
    • C) Confidence Level
    • D) Beta
  • For significance testing of the F-ratio, our one-way ANOVA test will always be b58fc729-690b-4000-b19f-365a4093b2ff-7B7B3C20626C616E6B207269676874203E7D7D–tailed due to the ratio nature of statistic, with larger numbers suggest greater variation between groups relative to the variation within groups!
    • Put another way, we are trying to see if it is far enough out on the right tail to say that it is significant!
  • Discuss: Which other distribution also always had a right-tailed application? Explain why that is using the relevant formula for that statistic.

4 One-way ANOVA

Agenda

1 Overview and Introduction

2 The F Distribution

3 The F Ratio

4 One-way ANOVA

5 Conclusion

4.1 Introduction

  • Important: In the prior calculations, we covered the mathematical computation in a one-way ANOVA, but this next part will cover the more conceptual side of things.
  • The goal of the one-way ANOVA is to determine if there are significant differences between multiple group means
    • This is done by examining the variances of the groups (as we previously discussed as part of The F Ratio)
  • This is how we will practically use the F-distribution

4.2 Assumptions

  • Much like the other tests, we need to be mindful of several assumptions that underline this statistical test

  • These assumptions are:

    • Each population from which a sample is taken is assumed to be normal.
    • All samples are randomly selected and independent.
    • The populations are assumed to have equal standard deviations (or variances).
    • The factor is a categorical variable.
    • The response is a numerical variable.

4.3 Null and Alternative Hypotheses

  • In a one-way ANOVA, the null hypothesis is as follows:
    • \(H_0: \mu_1 = \mu_2 = \mu_3 ... = \mu_k\)
    • \(H_A:\) At least two of the group means \(\mu_1,\mu_2,\mu_3,..., \mu_k\) are not equal
  • Important: Be *very* careful in how you phrase the alternative hypothesis for this test, it looks a little bit different than the other tests we've covered before!
  • Effectively, a statistically significant F-statistic only tells us that a difference exists somewhere, but not where that difference lies
    • More explanation on this in the Extensions section!
  • Discuss: What is the name of type of the following plots?

4.4 Extensions

  • Of course, this is only the tip of the iceberg

  • One question that the ANOVA alone doesn't answer: which of the groups are different?

    • If we have a significant one-way ANOVA test, then we can follow up with post-hoc testing, which can tell us which groups exactly are different than one another
  • There is also the two-way ANOVA, which allows us to compare two or more independent variables and their combined and interacting impact on a single dependent variable in two-way Analysis of Variance.

    • While beyond the scope of this class, interactions are an incredibly important part of a lot of more complicated studies
  • Finally, there is the repeated-measures ANOVA

  • Discuss: Which of the previous tests we talked about dealt with comparing two sets of measurements, taken at two separate time points, on the same sample?

5 Conclusion

Agenda

1 Overview and Introduction

2 The F Distribution

3 The F Ratio

4 One-way ANOVA

5 Conclusion

5.1 Recap

  • We can think of a one-way ANOVA as being similar to an independent-samples t-test, but for more than 2 groups. Just note the differences in the null/alternative hypotheses setup

  • There are some important extensions and other applications of the ANOVA that we did not comprehensively cover here. However, they do still employ the F-distribution and have the underlying focus on comparing variances to make conclusions about differences between groups

  • The F-distribution adds to the other practical distributions employed as part of hypothesis testing, such as the t-distribution and chi-squared distribution. Just like those, it allows us to conclude whether a null hypothesis can be rejected or retained.

5.2 Lecture Check-in

  • Make sure to complete any lecture check-in tasks associated with this lecture!

Module 13 Lecture - Complexity in ANCOVA || Analysis of Variance